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ABSTRACT 

We measure the redshift space reduced void probability function (VPF) for 
2dFGRS volume limited galaxy samples covering the absolute magnitude range 
Mf,j — 5\og 10 h = — 18 to —22. Theoretically, the VPF connects the distribution of 
voids to the moments of galaxy clustering of all orders, and can be used to discrimi- 
nate clustering models in the weakly non-linear regime. The reduced VPF measured 
from the 2dFGRS is in excellent agreement with the paradigm of hierarchical scaling 
of the galaxy clustering moments. The accuracy of our measurement is such that we 
can rule out, at a very high significance, popular models for galaxy clustering, in- 
cluding the lognormal distribution. We demonstrate that the negative binomial model 
gives a very good approximation to the 2dFGRS data over a wide range of scales, 
out to at least 20/i _1 Mpc. Conversely, the reduced VPF for dark matter in a ACDM 
universe does appear to be lognormal on small scales but deviates significantly be- 
yond ~ 4/i^ 1 Mpc. We find little dependence of the 2dFGRS reduced VPF on galaxy 
luminosity. Our results hold independently in both the north and south Galactic pole 
survey regions. 

Key words: galaxies: statistics, clustering; cosmology: theory, large-scale structure, 
voids. 
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1 INTRODUCTION 

The galaxy distribution on the largest scales display strik- 
ing geometrical features, such as walls, filaments and voids. 
These features contain a wealth of information about both 
the linear and non-linear evolution of galaxy clustering. The 
nature of such clustering is dependent on many large and 
small scale effects, such as the cosmological parameters, 
galaxy and cluster environmental effects and history, the un- 
derlying dark matter distribution, and the way in which the 
dark and luminous components of the Universe couple and 
evolve. By probing the lower and higher orders of galaxy 
clustering, one thus hopes to shed light on those physical 
processes on which the clustering is dependent. 

The traditional tool used to analyse such distributions 
has been the 2-point correlation function (Davis & Peebles 
1983, Davis et al. 1988, Fisher et al. 1994, Loveday et al. 
1995, Norberg et al. 2001, Zehavi et al. 2002), providing a de- 
scription of clustering at the lowest orders. However despite 
its usefulness, the 2-point correlation function only provides 
a full clustering description in the case of a Gaussian distri- 
bution. A more complete account of clustering must include 
correlation functions of higher orders, although these are of- 
ten difficult to extract (see Croton et al. 2004 and Baugh et 
al. 2004 for an analysis of galaxy clustering in the 2dFGRS 
up to sixth order). 

In light of this researchers have looked towards other 
clustering statistics to glean higher-order information from 
a galaxy distribution. Historically, many astronomers have 
favoured using void statistics (e.g. Fry 1986, Maurogordato 
& Lachieze-Rey 1987, Balian & Schaeffer 1989, Fry et al. 
1989, Bouchet et al. 1993, Gaztanaga &: Yokoyama 1993, 
Vogeley et al. 1994). This approach is useful in that results 
are easily obtainable and are well supported by a solid theo- 
retical framework (White 1979, Fry 1986, Balian & Schaeffer 
1989), which directly relates the void distribution to that of 
galaxy clustering of higher orders. 

In this paper we employ the completed 2dFGRS dataset 
to undertake a detailed analysis of the void distribution us- 
ing the reduced void probability function. We rely heavily on 
the well established theoretical framework which connects 
the void distribution with galaxy clustering of all orders 
(Eq. 1 below). The distribution of voids and the moments 
of galaxy clustering of all orders are known to be intimately 
linked, and the study of one can reveal information about 
the other which would otherwise be difficult to measure. Our 
goal is thus to use the reduced void probability function to 
investigate if galaxy clustering in the 2dFGRS obeys a hi- 
erarchy of scaling, and on what physical scales this scaling 
holds. We explore a number of phenomenological models of 
galaxy clustering which exhibit hierarchical scaling, and use 
these models to help clarify the way in which higher-order 
clustering is constructed 1 . 

This paper is organised as follows. In Section 2 we give 
a brief review of the theory behind the void statistics to be 
employed in our analysis. In Section 3 we present the 2dF- 
GRS data set, and in Section 4 the counts- in-cells method 



1 Recently Hoylc ct al. (2004) also measured the VPF of the 
2dFGRS galaxy distribution, however their analysis focused more 
on the physical properties of voids in the 2dFGRS volume, rather 
than the hierarchical nature of galaxy clustering itself. 



we use to measure the void statistics is explained. Our re- 
sults are presented in Section 5, and in Section 6 we provide 
a discussion and summary of our conclusions. Throughout, 
we adopt standard present day values of the cosmological 
parameters to compute comoving distance from redshift: a 
density parameter O m = 0.3 and a cosmological constant 
Q A = 0.7. 



2 VOID STATISTICS 

2.1 The Void Probability Function 

For a given distribution of galaxies, the count probabil- 
ity distribution function (CPDF), Pn{V), is defined as the 
probability of finding exactly N galaxies in a cell of volume 
V randomly placed within the sample. In the case where 
N — we have the void probability function (VPF), Pq(V). 
A choice of spherical cells with which to sample the distri- 
bution makes Po a function of sphere radius R only. The 
VPF can be related to the hierarchy of p-point correlation 
functions by (White 1979): 



P (R) = exp 



E 



( -N(R) 



-UR) 



(i) 



Here N is the average number of objects in a cell of volume 
V, and Ij, is the p th order correlation function averaged over 
V. A completely random (Poisson) distribution has £ p = 
for all p > 1, and thus Po reduces to a simple analytic 
expression: 



P 0p (R)=exp[-N(R)] 



(2) 



Any departure from this relation is therefore a signature of 
the presence of clustering. 

2.2 Hierarchical Scaling 

The idea that higher-order clustering arises in a hierarchi- 
cal fashion from the 2-point correlation function appears 
naturally in perturbation theory and also in the highly non- 
linear regime of gravitational clustering (e.g. Peebles 1980), 
and is supported by much observational evidence (e.g. Mau- 
rogordato &: Lachieze-Rey 1987, Fry et al. 1989, Gaztanaga 
1992, Bouchet et al. 1993, Bonometto et al. 1995, Benoist et 
al. 1999, see Bernardeau et al. 2002 for a review). The con- 
cept can be generalised by assuming that each p-point cor- 
relation function depends only on the product of the 2-point 
correlation function and a dimensionless scaling coefficient, 



uR) = s P e- l (R) 



(3) 



where we have dropped the subscript 2 for the 2-point cor- 
relation function on the right-hand side for convenience (see 
Baugh et al. 2004 and Croton et al. 2004 for the measured 
values of S p up to p = 6 in redshift space for the 2dFGRS). 

The hierarchical idea is directly applicable to the VPF, 
which is itself dependent on an infinite sum of p-point cor- 
relation functions. The hierarchical assumption allows us to 
remove the higher-order correlation functions from Eq. 1: 



P (R) =exp 



(-Nf 



s P r 



(4) 
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Furthermore, the above scaling relation allows us to express 
the VPF as a function of iV£ only, where the scaling variable 
N£ approximately represents the average number of galaxies 
in a cell in excess of that expected given the mean density of 
the sample. We formalise this idea by firstly considering the 
analytic VPF expression for a purely random sample (Eq. 2). 
For the hierarchical situation, we can define a parameter x 
with Po = e~ Nx , called the reduced void probability function 
(see Fry 1986): 

X = -ln(P )/JV. (5) 

We note here that, independent of the hierarchical assump- 
tion, x normalises out the Poisson contribution to the dis- 
tribution, and it is clear that the effects of clustering will 
appear as values of \ < 1- Combining Eq. 4 and 5, the re- 
duced VPF takes the form 

x(^) = EffH^) P " 1 - (6) 

p=l 

This exhibits the scaling advertised above, and the shape 
of x{N£,) thus characterises the distribution of voids. If the 
scaling relation assumption holds, we expect different galaxy 
samples of different density and clustering strength to all 
collapse onto one universal curve, since all are a function of 
the same scaling variable. The curve will not be universal for 
different magnitude ranges if it turns out that the coefficients 
S p are a strong function of galaxy magnitude. The values of 
S p have recently been shown to depend at best only weakly 
on magnitude (see Croton et al. 2004). 

In the hierarchical picture, when JV£ <C 1 one always 
recovers the Poisson VPF, x(N£) = 1, regardless of the 
actual clustering pattern or its strength. In the regime where 
iV£ < 1 we see from Eq. 6 that the reduced void probability 
function is dominated by the Gaussian contribution: 1 — 
§iV£. Thus the interesting observational window, where we 
can separate different clustering models, comes for values of 
N£ larger than unity. In practice, this only seems to happen 
at scales R larger than a few /i _1 Mpc, where N ~ R 3 is large 
and dominates f ~ R~ 2 . On smaller scales, where £ > 1, N£ 
will always be small, and galaxy samples will typically be 
too sparse to show measurable deviations from the Gaussian 
contribution. Thus, it should be stressed that the VPF is a 
good discriminant of weakly non-linear clustering only. In 
the highly non-linear regime voids do not provide us with 
much information. 

Although the expansion given in Eq. |S| is technically 
only valid for small values of N£, the implications for clus- 
tering do extend beyond this. For large values of iV£ models 
with different hierarchical amplitudes S p give different re- 
duced void probabilities x : as N£ increases the value of x 
gets smaller and the resulting VPF gets larger (with respect 
to the corresponding Poisson case). The Gaussian CPDF 
(S p — 0) produces the smallest values of x an d therefore the 
largest deviations in the VPF. As we will illustrate with the 
models below, larger values of S p > will result in larger 
values of x(^0- 

2.3 Phenomenological Models 

In presenting our reduced VPF results, we follow the lead of 
Fry (1986) and Fry et al. (1989) and compare with a number 
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Figure 1. Reduced void probability \ f° r different models (left to 
right at \ ~ 0.4): Gaussian (long-dashcd-dotted, Eq. 12), minimal 
(long-dashed, Eq. 7), BBGKY (short-dashed-dotted, Eq. 10 with 
Q = 2/3), negative binomial (continuous, Eq. 8), thermodynamic 
(dotted, Eq. 9), lognormal (short-dashed) and BBGKY (short- 
dashed-dotted, Eq. 10 with Q = 1). 

of model scaling relations that differ in the way they fix the 
scaling coefficients S p . We give a brief description of these 
models here, and refer the reader to the cited papers and 
references therein for further details. In Fig.Qwe summarise 
the behaviour of each. 

2.3.1 Minimal model 

The first model is the so-called minimal cluster model, the 
motivation of which is to consider a clumpy galaxy distribu- 
tion of clusters, the cluster distribution in space itself being 
Poisson with a Poisson galaxy occupancy. This is reminis- 
cent of the halo model (e.g. Cooray & Sheth 2002) but with a 
Poisson halo/cluster profile. Evaluating the set of S p 's from 
the distribution function generated by this model leads to a 
functional form for x of 

X = (l-e-^)/JV| (minimal), (7) 
Sp — 1 (Skewness : S3 = 1) . 

Fry (1986) speculated that this model represents a lower 
bound on the allowable functions m an y consistent 

hierarchical model. 

2.3.2 Negative binomial model 

The second model, commonly called the negative bino- 
mial model, has been used in a number of fields with dif- 
ferent physical motivations (Klauder & Sudarshan 1968, 
Carruthers & Shin 1983, Carruthers & Minh 1983, Fry 
1986, Elizalde & Gaztafiaga 1992, Gaztanaga & Yokoyama 
1993). After a set of T independent trials with probabil- 
ity q for "success" and p — 1 — q for "failure" , the prob- 
ability of having 5 number of successes and F = T — S 
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number of failures is given by the binomial distribu- 
tion: P(S) = (F + S)l/S\/F\(l ~q) F q S . The negative bi- 
nomial distribution describes the probability for having S 
number of successes after a fixed number F of failures: 
P(S) = (F + S- l)!/5!/(F - 1)!(1 -q) F q s . Note that in 
the binomial case what is fixed is the total number of trials. 

We can identify a "success" as finding a galaxy in a 
cell, so that P N = P(N = 5") is the CPDF. The fixed num- 
ber of failures, F, is assumed to be inversely proportional 
to £ (the larger the £, the smaller the number of failures 
to count a galaxy in a cell). The probability for a failure p 
is assumed to be proportional to the product iV£ (because 
of clustering there is an N£ rms excess of galaxies within 
a cell with N density: the larger this dumpiness the larger 
the probability to miss galaxies in a random cell). After fix- 
ing the proportionality constants, this leads to F — l/£ and 
p = N£/(l + N£) (for a different derivation see Gaztanaga 
& Yokoyama 1993). This model is a discrete version of the 
gamma probability distribution (see Gaztanaga, Fosalba & 
Elizalde 2000). The reduced VPF and cumulants in this case 
are: 



X = ln(l + N£)/N£ (negative binomial), 
S p = (p - 1)! (Skewness : S 3 = 2) . 



(8) 



2.3.3 Thermodynamic model 

The third model was first suggested by Saslaw and Hamilton 
(1984) and arose from a thermodynamic theory of the prop- 
erties of gravitational clustering. The original model had a 
fixed degree of virialization (temperature or density vari- 
ance) for all cell sizes, but such behaviour is inconsistent 
with observations. The model was later extended (see e.g. 
Fry 1986) to include a different level of virialization at each 
scale, to be identified with the variance £ as a function of 
scale. The results is: 

X = [(1 + 2iV£) 1/2 - l]/iVf (thermodynamic), (9) 
S p = (2p-3)!! (Skewness : S 3 = 3) , 

where (2p-3)!! = (2p - 3).(2p - 5).(2p - 7)... and truncates 
at zero. 



2.3.5 BBGKY model 

The BBGKY model of Fry (1984) provides a prescription for 
X and S p as an asymptotic solution to the BBGKY kinetic 
equations: 



X = 1- (7 + hi4QiVC)/8Q (BBGKY), 



(10) 



Sp — 



2(p - 1) 



where 7 = 0.57721... is Euler's constant. This asymptotic 
solution is only a good approximation for large values of 
iV£. When N£ becomes small, for completeness we simply 
match it to the nearest model. 

The skewness in the BBGKY model contains a free pa- 
rameter, S3 = 3Q, with the restriction that Q > 1/3. Fry 
(1984) used Q ~ 1, which was close to the then observed 
S3 ~ 3 value measured from the 3-point function in real 
space (inferred from projected maps). Croton et al. (2004) 
and Baugh et al. (2004) have since shown that S3 is in 
fact closer to S3 = 2 in the 2dFGRS, corresponding to the 
case where Q = 2/3. Both possibilities are shown as short- 
dashed-dotted lines in Fig.0 with the upper curve for Q = 1 
and the lower curve for Q — 2/3. Since we later show that 
neither of these Q values with the BBGKY model are able to 
match the data very well, for the sake of clarity we omit the 
lower Q — 2/3 curve in subsequent figures. The upper curve 
is retained in order to demonstrate the range of possible x 
values that a hierarchical model may have. 

2.3.6 Poisson and Gaussian distributions 

In addition to the above models we also use the analytic ex- 
pressions of the reduced VPF for purely Poisson and Gaus- 
sian distributions. Trivially, from Eq. 6 we see that 



X = 1 (Poisson), 



and 



X = 



'5 p 



1 — 2"^£ (Gaussian), 
(Skewness : S3 = 0) 



(11) 



(12) 



The later only makes sense for small values of N£, but note 
that even when the underlying distribution is not Gaussian, 
the above expression always gives a good approximation to 
the void probability in the limit of small 7V£. 



2.3.4 Lognormal distribution 

The lognormal distribution (e.g. Coles & Jones 1991, Wein- 
berg & Cole 1993), is often used as a phenomenological 
model for galaxy and dark matter clustering. Although no 
analytic expression exists for the reduced void probability, it 
can be estimated numerically (see above references) and is 
found to behave similarly to the thermodynamic model, as 
shown in Fig. (note how the dotted and the short-dashed 
lines overlap). As in the thermodynamic model, the lognor- 
mal distribution also has a large skewness: S3 = 3 + £ (which 
exactly tends to the thermodynamical value S3 — > 3 on large 
scales where | — » 0). In fact, it should be noted that the log- 
normal model is not truly hierarchical, as it does not have 
constant moments S p , but in practice the variations have 
little effect on the reduced void distribution. 



3 THE DATA SETS 

3.1 The 2dFGRS Data Set 

In our analysis we use the completed 2dFGRS (Colless et al. 
2003). The catalogue is sourced from a revised and extended 
version of the APM galaxy catalogue (Maddox et al. 1990), 
and the targets are galaxies with extinction-corrected mag- 
nitudes brighter than bj=19.45. Our galaxy sample con- 
tains a total of 221,414 high quality redshifts. The median 
depth of the full survey, to a nominal magnitude limit of 
bj ~ 19.45, is z ~ 0.11. We consider the two large contigu- 
ous survey regions, one in the south Galactic pole (SGP) 
and one towards the north Galactic pole (NGP), and re- 
strict our attention to the parts of the survey with high 
redshift completeness (> 70%). Full details of the 2dFGRS 
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Table 1. Properties of the combined 2dFGRS SGP and NGP volume limited catalogues (VLCs). Columns 1 and 2 give the faint and 
bright absolute magnitude limits that define the sample. Column 3 gives the median magnitude of the sample, computed using the 
Schcchtcr function parameters quoted by Norberg et al. (2002). Columns 4, 5 and 6 give the number of galaxies, the mean number 
density and the mean inter-galaxy separation for each VLC, respectively. Columns 7 and 8 state the redshift boundaries of each sample 
for the nominal apparent magnitude limits of the survey; columns 9 and 10 give the corresponding comoving distances. Finally, column 
11 gives the combined SGP and NGP volume. All distances are comoving and arc calculated assuming standard cosmological parameters 
(Q m = 0.3 and Q A = 0.7). 



Mag. 
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Median mag. 


N G 
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dmean 




z max 






Volume 




5 log 10 h 


M bj _ S1 °S10 h 




10~ 3 /h- 3 Mpc 3 
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h ~ 1 Mpc 


ll- 1 Mpc 


10 6 fi- 3 Mpc 3 
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-19.0 


-18.44 


23290 


9.26 


4.76 


0.014 


0.088 


39.0 


255.6 


2.52 


-19.0 


-20.0 


-19.39 


44931 


5.64 


5.62 


0.021 


0.130 


61.1 


375.6 


7.97 


-20.0 


-21.0 


-20.28 


33997 


1.46 


8.82 


0.033 


0.188 


95.1 


537.2 


23.3 


-21.0 


-22.0 


-21.16 


6895 


0.110 


20.9 


0.050 


0.266 


146.4 


747.9 


62.8 



and the construction and use of the mask quantifying the 
completeness of the survey can be found in Colless et al. 
(2001, 2003). 

A model accounting for the change in galaxy magnitude 
due to redshifting of the bj-filter bandpass (k-correction) 
and galaxy evolution (e-correction) was adopted following 
Norberg et al. (2002): 



k(z) + e(z) = 



z + 6z 2 
1 + 20z [i 



(13) 



This model gives the mean k+e-correction over the mix of 
different spectral types observed in the 2dFGRS sample, and 
was shown by Norberg et al. to accurately account for such 
observational effects when estimating 2dFGRS galaxy abso- 
lute magnitudes. 



3.2 Volume Limited Catalogues 

The 2dFGRS galaxy catalogue is magnitude-limited, mean- 
ing the survey is constructed by observing galaxies brighter 
than the fixed apparent magnitude limit of bj=19.45. A 
magnitude-limited galaxy catalogue is not uniform in space, 
since intrinsically fainter objects may be missed even if they 
are relatively nearby, while the most luminous galaxies will 
be seen out to large distances. This non-uniformity of the 
magnitude-limited catalogue must be dealt with for a cor- 
rect statistical analysis, and the simplest way to do this with 
a catalogue the size of the 2dFGRS is by constructing a vol- 
ume limited catalogue (VLC) from the magnitude-limited 
sample. 

Volume limited catalogues are defined by choosing min- 
imum and maximum absolute magnitude limits. These lim- 
its, along with the intrinsic apparent magnitude limits of the 
survey, define minimum and maximum redshift boundaries 
via standard luminosity-distance relations (Peebles 1980). 
The VLC is built by selecting galaxies whose redshift lies 
within the minimum and maximum boundaries just deter- 
mined, and whose absolute magnitude lies within the spec- 
ified absolute magnitude limits. Such galaxies can be dis- 
placed to any redshift within the VLC volume and still re- 
main within the bright and faint apparent magnitude lim- 
its of the magnitude limited survey. Table 1 presents the 
properties of the combined NGP and SGP volume limited 
catalogues used in this paper. 



4 MEASURING THE GALAXY 
DISTRIBUTION 

To measure the void probability function we use the method 
of counts- in- cells. The survey volume is uniformly sampled 
with a large number (2.5 x 10 7 ) of randomly placed spheres of 
fixed radius R, and we record the number of times a sphere 
contains exactly N galaxies. Our choice of massive over- 
sampling ensures a high level of statistical accuracy in the 
calculation (Szapudi 1998). The CPDF can then be found as 
the probability of finding exactly N galaxies in a randomly 
placed sphere: 

P N (R) = ^ , (14) 

where Nn is the number of spheres that contain exactly N 
galaxies out of the total number of spheres thrown down, 
Nt- By definition, the void probability function is the prob- 
ability of finding an empty sphere: 



(15) 



The mean number of galaxies expected inside a sphere of 
radius R is readily calculated from 



n(r) = ^2np n (r) , 



(16) 



and this estimation of N for each individual VLC is found to 
be independent of scale and indistinguishable from that de- 
termined from the known mean galaxy density. The volume 
averaged 2-point correlation function, £2, is found directly 
from the second moment of the CPDF: 



6 OR) = 



{(N — N) 2 ) — N(R) 



(17) 



N(Ry 

We have also carried out an independent counts-in-cells 
analysis by placing the spheres at the positions of a regular 
spatial lattice that homogeneously oversamples the survey 
area. The results are insensitive to these details. 

The 2dFGRS has an inherent spectroscopic galaxy in- 
completeness which will change the results of any void anal- 
ysis (Colless et al. 2001). In addition, due to the irregular 
geometry of the survey boundaries it is difficult to guaran- 
tee that every sphere will be completely contained within 
the regions we wish to measure. Since the CPDF is sensi- 
tive to such effects we adopt a technique which accurately 
accounts for such deficiencies. This method is explained and 
tested in Appendix A (see also Croton et al. 2004) . 
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4.1 Error Estimation 

We estimate the error on our void statistics using the set 
of 22 mock 2dFGRS surveys described by Norberg et al. 
(2002; see also Cole et al. 1998). These mock catalogues 
have the same radial and angular selection function as the 
2dFGRS and have been convolved with the completeness 
mask of the survey. The mock catalogues are drawn from the 
Virgo Consortium's ACDM Hubble Volume simulation and 
thus include sample variance due to large scale structure (see 
Evrard et al. 2002 for a description of the Hubble Volume 
simulation). The la errors we quote correspond to the rms 
scatter over the ensemble of mocks (see Norberg et al. 2001). 
We have compared this estimate with an internal estimate 
using a jack knife technique (Zehavi et al. 2002). In the jack 
knife approach, the survey is split into subsamples. The error 
is then the scatter between the measurements when each 
subsample is omitted in turn from the analysis. The jack 
knife gives comparable errors to the mock ensemble for the 
VPF measurement. 



5 RESULTS 

We begin with Fig. 2, where we plot the reduced void 
probability function, x, individually as a function of both 
the mean galaxy number, N, and the variance, £, in the 
top and bottom panels respectively. The physical scale 
given on each top axis corresponds to values for the 
-20 > Mbj - 51og 10 > -21 VLC only, and is included for 
reference (for VLCs of different mean density the scale at 
which a given N or £ will occur will be different). Note that 
for VLCs fainter than our reference this scale shifts to the 
right in the top panel and to the left in the bottom panel. 
The converse is true when considering brighter galaxies than 
the reference. 

The main feature of this figure is that neither N nor £ 
individually show hierarchical scaling when plotted against 
X- Note that smaller values of \ correspond to larger devi- 
ations from a Poisson distribution. Brighter galaxy samples 
show behaviour which is closer to that of the Poisson distri- 
bution for any given value of N or £, however this merely 
reflects the fact that the brightest VLCs are also the sparsest 
(Table 1). 

We now test for hierarchical scaling in the 2dFGRS, as 
outlined in Section 2.2. In Fig. [3] we plot the reduced void 
probability function, x, as a function of the scaling variable 
N£. In this way we eliminate the dependence of the void 
probability on the variance and mean density. This figure 
shows VLCs ranging in absolute magnitude from —18 to 
—22. If a scaling between correlation functions of different 
orders exits we expect to see all points for each catalogue 
fall onto the same line. Again we provide a reference scale on 
the top axis, given for the —20 > Alb-, — 51og 10 > —21 VLC, 
and note that for fainter galaxy samples this scale shifts to 
the right and conversely for brighter samples. Over-plotted 
are the scaling models previously discussed in Section 2.3: 
(bottom to top) the Gaussian (Eq. 12), minimal (Eq. 7), neg- 
ative binomial (Eq. 8), thermodynamic (Eq. 9), lognormal, 
and BBGKY (Eq. 10, Q = 1) models respectively. 

Fig. [3] demonstrates the clear signature of hierarchical 
scaling in the clustering moments of the 2dFGRS. All points 
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Figure 2. The 2dFGRS reduced VPF, x = ~ lnPn/JV, as a func- 
tion of (top) the mean galaxy number, N, and (bottom) the vari- 
ance of the distribution, £, as measured for volume limited cata- 
logues in varying luminosity bins (Table 1). Smaller values of \ 
imply larger deviations from a Poisson distribution. The reference 
scale given on the top axis is for the —20 > — 5 log 10 > —21 
VLC only (each N and £ value individually correspond to differ- 
ent scales for each VLC). Notice that neither variable displays 
hierarchical scaling when plotted individually against x- 



are seen to follow a tight path (within the error bars) out 
to values of N£ ~ 30, and sit close to the negative bino- 
mial model prediction along this entire range. Such values 
encompass galaxy clustering from the deeply non-linear to 
the linear regime, revealing hierarchical scaling out to scales 
of ~ 20/i _1 Mpc or more. 

For comparison, in Fig. 3 we also present the dark mat- 
ter reduced VPF measured from the ACDM Hubble volume 
simulation (particle mass 2.3 x 10 la ft _:l M s ) (Evrard et al. 
2002). We independently analyse 100 randomly placed cubes 
of side length 200/i _1 Mpc (approximately equal in volume 
to our M* galaxy volume limited sample), from which the 
rms is then plotted. In contrast to the 2dFGRS galaxies, 
the dark matter follows a lognormal distribution out to val- 
ues of N£ ~ 6 (a scale of approximately R ~ 4/i _1 Mpc in 
the simulation), but then deviates strongly on larger scales 
(the last point plotted corresponds to R = 10/i -1 Mpc in the 
simulation) . 

To highlight the differences between the 2dFGRS galaxy 
reduced VPF and the negative binomial prediction, in Fig. 4 
we show the fractional difference between the two. Also in- 
cluded are the "bounding" models closest to the negative 
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Figure 3. The reduced VPF, \ = — InPo/N, as a function of the scaling variable JV£ for the four 2dFGRS galaxy VLCs from Table 1. The 
dark matter reduced VPF, as measured from the ACDM Hubble Volume simulation, is shown as large diamonds. In all cases, smaller values 
of x imply larger deviations from a Poisson distribution. The reference scale given on the top axis is for the —20 > — 51og 10 > —21 
VLC only (each 7V£ value corresponds to a different scale for each VLC). If hierarchical scaling is present in the galaxy distribution all 
points should collapse onto a single line, which is clearly seen. The six curves represent the hierarchical models discussed in Section 2.3 
(Eq. 7 to 12). 
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Figure 4. The fractional difference between the negative bino- 
mial model and the 2dFGRS, thermodynamic, and minimal re- 
duced VPFs. The reference scale given on the top axis is for 
the -20 > M bl - 51og 10 > -21 VLC only (each JVf value cor- 
responds to a different scale for each VLC). Some error bars have 
been omitted for clarity. All 2dFGRS results are consistent with 
the negative binomial model at the 2a level. 

binomial: the minimal and thermodynamic models. All 2dF- 
GRS points plotted are consistent with the negative bino- 
mial model at the 2a level. At larger values of JV£ we find 



some small departures from the negative binomial model, 
and it is interesting to note that these deviations appear 
the greatest for the faintest VLC. This could be explained 
by the weak dependence of S p on galaxy luminosity found 
by Croton et al. (2004), where fainter samples typically had 
larger S p values than brighter samples (albeit with large er- 
ror bars). The effect of such an increase in the hierarchical 
picture would result in a value of \ closer to unity (Eq. 6). 

An important feature of Fig. |3] is the inconsistency of 
the reduced void probability function with a Gaussian dis- 
tribution across all scales considered (up to approximately 
30/i~ Mpc). On large scales where the galaxy correlation 
functions become too small to measure independently, the 
value of N is found to increase faster than £ decreases, and 
thus x is s tiU affected strongly by higher-order correlations. 
It is clear that even in the quasi- linear regime, where one 
would expect galaxy clustering to be very simple, higher- 
order correlations still play a significant role in the make-up 
of the large scale distribution. 

To evaluate the robustness of the results seen in Fig. |3] 
we apply two tests to illustrate the degree of confidence we 
should have in believing the existence of hierarchical scaling 
in the 2dFGRS. Firstly, one of the most valuable features 
of the 2dFGRS is that we have available data from two to- 
tally independent regions on the sky, the SGP and NGP. 
So far we have been calculating our void statistics from the 
combined volume of the two, but it is useful to check that 
the scaling properties still exist in the two regions indepen- 
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Figure 5. Two tests of the scaling properties seen in Fig. Isl using 
the reduced VPF, \ = — In Po /N, as a function of the scaling vari- 
able JVf . (top) Independent SGP and NGP VLCs show identical 
scaling to that seen in Fig. [3] Here the large symbols represent 
the SGP result, and the small represent the NGP result, (bot- 
tom) The same combined VLCs as in Fig. [3] but now diluted by 
factors of 0.5 (large symbols) and 0.25 (small symbols). If hier- 
archical scaling exists in the galaxy distribution, dilution should 
make little difference to the results found in Fig. O For both pan- 
els, the dotted curves represent the same six models plotted in 
Fig. 3 and discussed in Section 2.3. Some error bars have been 
omitted for clarity. 



dently. This we do in the top panel of Fig. 5, where the large 
symbols represent the SGP and small symbols the NGP. It 
is immediately clear that galaxies from both the SGP and 
NPG regions independently obey hierarchical scaling and re- 
produce the negative binomial results discussed previously 
to good accuracy. 

Secondly, we test the scaling properties seen in Fig.E3by 
calculating the reduced VPF for randomly diluted samples 
of galaxies. Such dilutions leave the 2-point correlation func- 
tion unchanged, and within the hierarchical paradigm the 
scaling exhibited in Fig. [3] should also remain unchanged. 
This test is shown in the bottom panel of Fig. 5, where we 
have diluted each of the VLCs used in Fig. [3] by factors 
of 0.5 (large symbols) and 0.25 (small symbols). We again 
see that the trend for hierarchical scaling exists and follows 
the negative binomial model, consistent with our previous 
conclusions. 



6 DISCUSSION 

The 2dFGRS represents an enormous improvement in vol- 
ume and number of galaxies over previous surveys, such 
as the CfA or the SSRS samples. Here we measure the 
galaxy distribution over both a wider range in variance 
(f ~ 0.3 - 20) and mean galaxy number (N ~ 10~ 4 - 10 2 ). 
The impact on the VPF can be seem by comparing Fig. 3 
above to Fig. 7 in Gaztanaga & Yokoyama (1993), where the 
CfA and SSRS data can not discriminate between the neg- 
ative binomial and the thermodynamical models. As shown 
here in Fig. 3 and 4, although the agreement is not always 
perfect, the negative binomial does much better, by far, than 
any of the other models considered in the literature. This 
includes the lognormal distribution, which is close to the 
thermodynamical model (Fig. 1) and is widely used as a 
phenomenological clustering model. These results are valid 
independently in the NGP and SGP regions of the survey, 
and do not change when we randomly dilute the galaxy sam- 
ples (Fig. 5). The lognormal distribution does, however, ap- 
pear to be a good representation for the distribution of dark 
matter on smaller scales (less than ~ 4/i _1 Mpc), although 
not at larger scales. The differences between the galaxy and 
dark matter reduced VPFs can be understood by noting 
the differences between their higher-order volume- averaged 
correlation functions, as shown by Baugh et al. (2004). 

The 2dFGRS reduced void probability function appears 
to behave differently from the one presented by Vogeley et al. 
(1994) for the CfA-1 and CfA-2 samples, which show more 
scatter with magnitude and values well above the negative 
binomial model (compare their Fig. 4 to our Fig. EJ. Here 
we do not observe any significant departure from the scaling 
models on scales larger than R ~ 8.5/i _1 Mpc as they had 
previously found. In contrast, our results indicate hierarchi- 
cal scaling exists in the galaxy distribution out to scales of 
at least R ~ 20/i _1 Mpc. 

Although some heuristic derivations exist for the neg- 
ative binomial distribution (see section 2.3), we have not 
found a satisfactory physical explanation for the very good 
performance of this model. The value of the skewness for the 
negative binomial model, S3 = 2, is quite close to the direct 
measurement in the 2dFGRS: S3 = 1.86-2.03 (Baugh et al. 
2004). Other phenomenological models, such as the thermo- 
dynamical or the lognormal distribution, have larger values 
for the skewness (S3 ~ 3). A similar trend was found by 
Baugh et al. for the higher order coefficients S4, S5, and Sg. 
In this respect it is not totally surprising that the negative 
binomial does better. The one freedom the reduced VPF has 
is in the value of the scaling coefficients which appear in the 
sum in Eq. 6. If these coefficients are found to match that 
predicted by a particular hierarchical scaling model, then 
one would expect their reduced VPFs to look similar. 

Perturbation theory with Gaussian initial conditions 
predict values for the S p 's that are universal and only de- 
pend on the local spectral index. They are therefore a known 
function of scale. Such scale dependence, however, breaks 
the hierarchy in Eq. 3, and therefore the universality of the 
scaling in Eq. 6. On the other hand, redshift space distor- 
tions and biasing tend to wash away this scale dependence 
(see e.g. Fig. 49 in Bernardeau et al. 2002), an argument 
which has been used to explain the good performance of 
the scaling hierarchy. But, as shown by Baugh et al. (2004) 
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and Croton et al. (2004), the measured values of the S p 's do 
not seem to match the expectations in either dark matter 
models or mock galaxy surveys (both in redshift space) . The 
reasons for this, and a more physically motivated interpreta- 
tion of the negative binomial model, will provide important 
constraints to be matched by models of galaxy formation. 
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APPENDIX A: CORRECTING FOR 
INCOMPLETENESS IN THE 2DFGRS 

The 2dFGRS is spectroscopically incomplete to a small de- 
gree resulting in missed galaxies (see Colless et al. 2001), 
and some spheres used in our counts-in-cells analysis may 
straddle the survey boundaries or holes resulting in missed 
volume. Such influences will induce an artifical "voidness" 
that will be picked up by our VPF measurements, and any 
analysis that neglects these effects will tend to over predict 
the VPF. Thus it is desirable to devise a method with which 
one can confidently correct for such incompleteness. This is 
not a trivial exercise, since weighting schemes that work with 
other statistics (e.g. Efstathiou et al. 1990) cannot necessar- 
ily be applied here, as the VPF will remain uncorrected (how 
does one weight no galaxies?). Such techniques will lead to 
an under-estimation of the mean density of galaxies and an 
over-estimation of the influence of the voids. Ideally, we need 
to ensure that any correction faithfully reproduces the full 
CPDF of the complete distribution for all orders of galaxy 
clustering. 

To resolve these problems we have adopted the follow- 
ing method. When a satisfactory sphere location is found in 
the 2dFGRS wedge we project the sphere onto the sky and 
estimate, using the survey masks (Colless et al. 2001), the 
average completeness / within the sphere. Due to the in- 
completeness effects described above we typically will have 
/ < 1. Instead of viewing this incompleteness as missed 
galaxies, we instead consider it as missed volume, and to 
compensate we scale the radius of the sphere according to 
R' = R/fs. This new radius gives an effective sphere vol- 
ume with incompleteness equal to that of a 100% complete 
sphere with the original radius. Galaxies are counted within 
the new radius R' , but contribute their counts to the scale 
R. Each sphere we place is individually scaled in this way 
according to its local incompleteness, as given by the masks. 
We note that due to our chosen acceptable minimum incom- 
pleteness of 0.7 spheres are never scaled beyond the radius 
bin R under consideration. Thus each correction applies only 
to the value of the VPF at each radius point plotted. 

We have tested the robustness of our method by com- 
paring measurements of the CPDF using a fully sampled, 
complete Hubble Volume 2dFGRS mock VLC (Norberg et 
al. 2002) with those from the same mock but which have 
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Figure 6. Correcting for incompleteness in the 2dFGRS. The 
CPDF, Pjv, for a Hubble Volume 2dFGRS mock VLC in the mag- 
nitude range -19 > M bj -51og 10 > -20: left to right N = (the 
VPF), 2, 6, 20 and 70. The points with errors represent the com- 
plete mock, the solid line is the corrected incomplete mock and 
the long-dashed line is the uncorrected incomplete mock. Note the 
uncorrected incomplete mock always lies outside the error bars. 

been made artifically incomplete using the survey masks 
(spectroscopically, and including irregular boundaries and 
holes) and then corrected. In Fig. 6 we show the results for 
P N vs. radius, where N = (the VPF), 2, 6, 20 and 70 
(note other TV's are omitted for clarity, but all behave sim- 
ilarly over the scales where the VPF is of interest to us). 
Here the points with error bars are the complete Pjv's, the 
solid lines are the equivalent corrected incomplete Pjv's, and 
the dashed lines represent the uncorrected incomplete Pat's. 
As can be seen, the complete points and corrected lines are 
fully consistent, whereas the uncorrected values almost al- 
ways lie off the complete points and well outside their error 
bars (note the steepness of each curve which is plotted on 
a log scale). The Po curve in particular demonstrates that 
such incompleteness effects must be accounted for to obtain 
correct void measurements; simply building volume limited 
catalogues is not enough and will lead to an over-prediction 
of the scale and frequency of voids in the survey. Our method 
can be applied to any counts-in-cells analysis where incom- 
pleteness in the galaxy distribution is present. 



